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ABSTRACT 


In this thesis, an attempt has been made to study and implement the concepts of 
Wavelet Transforms and multiresolution approximation of images on a network of 
Transputers. The decomposition and reconstruction adgorithms axe derived first for 
one-dimensional case and then extended to the two-dimensional case, i.e., the case of 
images. These algorithms are implemented on a linear array of eight transputers using the 
data— pcutitioning approach. The performance of Wavelet decomposition and reconstruction 
on noise-corrupted images are also studied. All the program coding is done in Occam, the 
native language for the transputers. The program development, testing and debugging is 
done under the Transputer Development System environment. The performance of the 
various basis functions used are compared. 
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CHAPTER 1 


INTRODUCTION 

1.1 WAVELETS AND SIGNAL PROCESSING 


Wavelet theory provides a unified firamework for a number of techniques 
which had been developed independently for various signal processing applications. 
For example, multiresolution signal processing, used in computer vision; sub band 
coding, developed for speech and image compression; and wavelet series expansions, 
developed in apphed mathematics have been recently recognized as different views 
of a single theory. 

Wavelet theory covers quite a large area. It treats both the continuous and 
discrete time cases. It provides very general techniques that can be applied to 
many tasks in signal processing and therefore has many potential applications. 

In particular, the Wavelet transform (WT) is of interest for the an 2 dysis of 
non-stationary signals, because it provides an alternative to the classical 
Short-time Fourier Transform (STFT) or Gabor Transform (GT). The basic 
difference is as follows. In contrast to the STFT, which uses a single analysis 
window, the WT uses short windows at high frequencies and long windows at low 
frequencies. This is in accordance to the so-called "constant— Q” or constant 
relative bandwidth frequency analysis. 

For some applications it is desirable to see the W'T as a signal 
decomposition onto a set of basis functions. In fact, basis functions called wavelets 
always underlie the wavelet analysis. They are obtained ffom a single prototype 
wavelet by dilations and contractions (scaling) as well as translations (shifts) . The 
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prototype wavelet can be thought of as a band pass filter, and the constant— Q 
property of the other band pass filters (wavelets) follows because they sure scaled 
versions of the prototype. 

Therefore, in a WT, the notion of scale is introduced as an alternative to 
fiequency, leading to a so-called time— scale representation. This means that a 
signal is mapped into a time— scale plane (the equivaJent of the time— frequency 
plane used in the STFT). 

There are several types of wavelet transforms, and depending on the 

apphcation one may be preferred to the others. For a continuous input signal, the 
time and scale parameters can be continuous [4], leading to the Continuous 
Wavelet Transform (CWT). They may as well be discrete [5, 3, 6], leading to a 
Discrete Wavelet Transform (DWT). In the latter case it uses multirate signal 
processing techniques [7] and is related to sub band coding schemes used in speech 
and image compression. 

Wavelet theory has been developed as a unifying framework only recently, 

although similar ideas and constructions took pl 2 ice as eairly as the beginning of 
the century e.g., the works of P.Franklin(1928), A.Heiax(1910), J.Littlewood and 
R.Paley(1937), A.Calderon(1964) etc. The idea of looking at a signal at various 
scales and analyzing it with various resolutions has in fact emerged independently 
m many different fields of mathematics, physics and engineering. In the late 

eighties, Daubechies and Mallat, in addition to their contribution to the theory of 
wavelets, established coimections to discrete signal processing results [5, 1]. Since 

then, a number of theoretical as well as practical contributions have been made on 
various aspects of Vv^^T’s, and the subject is growing rapidly [8]. 
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1.2 MULTIRESOLUTION SIGNAL DECOMPOSITION 

In computer vision, it is difficult to analyze the information content of an 
image directly from the gray— level intensity of the image pixels. Indeed, this value 
depends on the lighting conditions. More important are the local variations of the 
image intensity. The size of the neighborhood where the contrast is computed must 
be adapted to the size of the objects that we want to emalyze [9]. This size 
defines a resolution of reference for measuring the local vririations of the image. 
Generally the structures we want to recognize have very different sizes. Hence, it 
is not possible to define a priori an optimal resolution for analyzing images. 
Several researchers for example, E.Hall, J.Rouge, R.Wong, D.Marr, T.Poggio, 

A.Rosenfeld and M. Thurston etc. have developed pattern matching algorithms which 
process the image at different resolutions. For this purpose, one can reorganize the 
image information into a set of details appearing at different resolutions . Given a 
sequence of increasing resolutions details of an image at the resolution 

rj are defined as the difference of information between its approximation at 

resolution Tj and its approximed;ion at the resolution rj_j^ . 

A muitiresolution decomposition helps us to have a scale— invariant 
interpretation of the image. The scale of an image depends upon the distance 
between the scene and the optical center of the camera. When the image scale is 
modified, our interpretation of the scene should not change. A multiresolution 

representation can be partially scale— invariemt if the sequence of resolution 

parameters exponentially. Assume that there exists a resolution step 

a e R such that for all integers j ; r- = £p. If the camera gets a times closer to 
the scene, each object of the scene is projected onto an area a" times bigger in 
the focal plane of the camera. That is, each object is meeisured at a resolution a 
times bigger. Hence the details of this new image at the resolution corresponds to 
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the details of the previom image at the resolution Rescaling the image by a 

translates the image details along the resolution axis. If the image details are 
processed identically at all resolutions, our interpretation of the image information 
is not modified. 

A multiresolution representation provides a simple hierarchical framework for 
interpreting the image information as shown by the work of J.Koenderink. At 
different resolutions, the details of the image generally characterize different 
physical structures of the scene. At a coarse resolution, these details corresponds to 
the larger structures which provide the inaage "context". It is therefore natural to 
analyze first the image details at a coarse resolution and then gradually increase 
the resolution. Such a coarse— to— fine strategy is useful for pattern recognition 
algorithms. 

Burt [13] and Cowley have each introduced pyramidal implementations for 
computing the signal details at different resolutions. In order to simplify the 
computations, Burt has chosen a resolution step a equal to 2. The details at each 
resolution 2^ are calculated by filtering the original image with the difference of 
two low— pass filters and by sub sampling the resulting image by a factor 2^. This 
operation is performed over a finite range of resolutions. In this implementation, 
the difference of two low— pass filters gives an approximation of the Laplacian of 
the Gaussian. The details at different resolutions are regrouped into a pyramid 
structure called the Laplacian pyramid. However, the Laplacian pyramid structures 
as studied by Burt and Cowley, suffer from the difficulty that data at several 
levels are correlated. There is no clear model which handies this correlation. It is 
thus difficult to know whether a similarity between the image details at different 
resolutions is a characteristic of the image itseif or to the intrinsic redundancy of 
the ’■ei^'^sentation. Furthermore, the Laplacian multiresolution representation does 
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not introduce any spatial orientation selectivity into the decomposition process. 

This spatial homogeneity can be inconvenient for pattern recognition problems such 
as texture discrimination as shown by the work of B.Juiesz. 

Here, we have transformed a function into an approximation at a 
resolution 2^. The difference of information between two approximations at the 
resolutions 2 ^*^ and 2^ is extracted by decomposing the function in a wavelet 
orthonormal basis. This decomposition defines a compete and orthogonal 
multiresolution representation called the wavelet representation. Wavelets have been 
introduced by Grossman and Morlet as functions f(x) whose translations amd 
dilations (ijs ?(sx— t))^^ ^ ^ can be used for expansions of L (R) 

functions. Meyer [35] showed that there exists wavelets ^(x) such that 

A 

(-j2Jt{2^x— t)^. 22 ^ orthonormal basis of L (R). These bases generalise the 

Haar basis. The wavelet orthonormal bases provide an important new tool in 

functional anedysis. Indeed, before them it had been believed that no construction 

0 

could yield simple orthonormal bases of L (R) whose elements had good 

loc 2 dization properties in both the spatial £md Fourier domains. 

The multiresolution approach to wavelets enables us to characterize the class 

2 

of functions ^(x) £ L (R) that generate em orthonormal basis. The model is first 
described for one— dimensional signals and then extended to two dimensions for 
image processing. The wavelet representation of images discriminates several spatial 
orientations. The computation of the wavelet representation may be eiccomplished 
with a pyramidal algorithm based on convolutions with quadrature mirror filters. 
The signal can also be reconstructed from a wavelet representation with a similar 
pyramidal algorithm. 
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1.3 TIE ALC30RITHMS 

A 2 j is the operator which approximates a signal at a resolution 2^. Since 
computers can only process discrete signals, we must work with discrete 
approximations. Thus is called a discrete approximation of f(x) at the 

resolution 2^. D 2 jf is called the discrete detail signal at the resolution 2^. It 
contains the difference of information between A^j+if and A^jf. The original 
discrete signal A^f measmred at the resolution 1 is represented by ( A^-jf , 

< j <-l )■ 

This set of discrete signals is called an orthogonal wavelet representation, and 

d 

consists of the reference signal at a coarse resolution A^-jf and the detail signals 


at the resolution 2^ for — J < j < —1 .It can be interpreted as a decomposition of 
the original signal in an orthonormal wavelet basis. f(x) is the orthogonal wavelet 
and the wavelet orthonormal basis is built by scaling and translating the function 
f(x). (j>(x) is the scaling function and H the corresponding conjugate filter. The 
impulse response of the filter G is related to the impulse response of the filter E 


by 


g(n) = (-1)^ "h(l-n). 

G is the mirror filter of H, and is a high pass filter .In signal processing, G and 
H are called quadrature mirror filters. 

The wavelet model can be generalized to any dimension n > 0. We have 

specifically studied the two-dimensional case for image processing applications. The 

2 2 

signal is now a finite energy function f(x,y) G L (R ).A multiresolution 

2 2 ' 2 2 * 
approximation of L (R ) is a sequence of subspaces of L (R ) which satisfi|s a 

straightforward two-dimensional extension. 
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The Dec omp osition Algorit hm : The two dimensional wavelel 

transform can be seen as a one— dimensional wavelet transform along the x and 5 

axes. It can be computed with a separable extension of the one— dimensionaJ 

d d 1 2 

decomposition algorithm. At each step we decompose A^j+if into A2jf > D2jf ,D2jf, 
and D^jf. This algorithm is illustrated by the block diagram in Figure 2 . 1 . We 
first convolve the rows of A^j+if with a one— dimensional filter, retain every other 
row, convolve the columns of the resulting signals with another one— dimensional 
filter and retain every other column. The fi.lters used in this decomposition are the 
quadrature mirror filters H and G which axe related to the filters H and G as 
follows:— 

g(n) = g(-n) and h(n) = h(-n). 

The structure of application of the filters is given in the Fig. 2.1. We compute 
the wavelet transform of an image A^f by repeating this process for —1 > j > — J. 
This corresponds to a separable conjugate mirror filter decomposition. 

The Reconstruction Algorit hm : The one— dimensional 

reconstruction algorithm can also be extended to two dimensions. At each step, the 
image A^j+if is reconstructed from A^jf jD^jf algorithm is 

illustrated in Figure 2.2. Between each column of the coarse and detail images we 
2idd a column of zeros, convolve the rows with one— dimensional filter, add a row 
of zeros between each row of the resulting image, and convolve the colu m ns with 
another one— dimensional niter. The filters used in reconstruction are the quadrature 

J 

mirror filters H and G. The image Ap is reconstructed from its wavelet transform 
by repeating this proce^ for — J ^ j —I- Figure 2.2 shows the reconstruction of 
the originai image from its wavelet representation. 



1 . 4 TRANSPUTERS AND OCCAM 


The word Transputer may be interpreted as a contraction of the wore 
Tranceiver and computer. The best known Transputers are the IMS T212, T41 
and T800. The T212 is a l&-bit processor, the other two are 32— bit processor! 
The T800 has a full IEEE floating-point processor on chip. Communication betwee 
the processes is eichieved by mesms of channels. Occam is the native language fo 
Transputers.To gain the most benefit from the Transputer architecture, the whol 
system can be programmed in Occam. This provides all the zvdvantages of a lug. 
level language, the maximum program efficiency and the ability to tise the specia 
features of the Transputer. The pattern in which the processors are connecte< 
together is known as topology or the configuration. Designing a- topology for i 
large system can be difficult. Transputer software is mostly developed under th 
'Transputer development environment (TDS) supplied by INMOS. 

1 . 5 . IMPLEMENTATION OF THE ALGORITHMS 
ON A TRANSPUTER NETWORK 

The above decomposition and reconstruction algorithms have beer 
implemented using transputers. In any image processing application, the time taker 
by the procedure is of crucied importance, especially so in real time applications 
The concept of parallel processing comes in hamdy here. Our equipment consists oi 
a root processor and 8 other processors. The root processor and 4 other processorj 
axe of the TSOO type while the remaining 4 are of the T414 type 

The configuration being considered is a linear array of the processors with 
data Sow along a linear path. We have utilized the concept of data— partitioning 
or in parallel processing parlance, "coarse grain concurrency”. 
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1.6 ORGANIZATION OF TIE THESIS 

The Thesis has been organized as follows: 

Chapter i is a brief introduction to the entire work. Its purpose is to giv€ 
an overview of the theoretical as well as the implementation aspects of the topic. 
It covers the role of wavelets in signal processing, the concept of multixesolution 
transforms, the decomposition and reconstruction algorithms, an introduction to 
Transputers and its language Occam. 

Chapter 2 gives a brief survey of the theoretical aspects related to wavelet 

representation and. It starts with an introduction to the notavtion used throughout 

2 

the thesis work. It then dwells on the multiresolution approximation of L (R) amd 
also enunciates the properties of A. 2 i which is the operator approximating the 
signal at a resolution 2K The wavelet representation and its implementation is 
then discussed. Next, the theoretical basis of signal reconstruction from an 
orthogonal representation is studied. All the previous concepts axe then extended to 
the domain of images i.e., 2— Dimensions. The decomposition and reconstruction 
algorithms in case of images aue discussed. 

Chapter 3 gives a brief insight into Trauasputers, the machines used for 
implementing the algorithms. It introduces the Transputers and then gives the 
architectural details like the instruction processor, the instruction set, the memory 
controller, the process scheduler, internal and external communication, the 
communication links, the timer and floating point processor. Occam is then 
introduced and a brief review of prograunming transputers is done. It also gives 
the haxdwaxe setup of various network configurations including the actual 
configuration which has been used to implement the decomposition amd 
reconstruction algorithms. 
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Chapter 4 <iiscusses the actual data transfer mechanism. It also gives th 
details of the image used, the choice of basis functions, the introduction of noist 
etc. 

Chapter 5 concludes the Thesis with a brief discussion on the result! 
achieved, the conclusions drawn, and some suggestions for further work. 

Appendix describes the way to derive the filter coefficients from the basis 
functions. 



CHAPTER 2 


MULTIRESOLUTION TRANSFORM 


2.1 NOTATION. 


Z and R denote the set of integers and real numbers respectively. L (R) 
denotes the vector space of measurable, square— integiable one— dimensional functions 
f(x). For f(x) 6 L^(R) and g(x) € L^(R), the inner product of f(x) with g(x) is 
written 

< g(u), f(u) > = /!® g(u) f(u) du 

2 

The norm of f(x) in L (R) is given by 

II f f = /> |fW 1^ du 


00 


2 2 

The convolution of two functions f(x) € L (R) and g(x) € L (R) is denoted by 

f * g(x) = (f(u) * g(u)) (x) 

= /> g(x-u) f(u) du 
0 

The Fourier transform of f(x) € L (R) is written f(w) and is defined by 

f(a.) = /!" f(x) e-j- dx. 

2 

I (Z) is the vector space of squaxe-summable sequences. 

f{Z) = { (O.); 5 z : IC I 


L'^(R^) is the vector space of measurable, square— integrable two— dimensional 
functions f(x,y). For f(x,y) 6 L^(R^) and g(x,y) 6 L^(R^), the inner product of 
f(x,y) with g(x,y) is written 

< f(x,y), g(x,y) > = /!® /!“ f(x,y) g(x,y) dx dy 
The Fourier Transform of f(x,y) € L‘^(R^) is written f(aJj.,Wy) and is defined by 

N f'*'® y-j g~j(wxX + w,^y) (jy 



12 


2.2 MULTIRESOLUTION APPROXIMATION OF L^(R) 

Let A 2 j be the operator which approximates the signal at a 
resolution 2K We suppose that our original signal f(x) is measurable and has a 
finite energy : f(x) 6 L^(R). We characterize A 2 j from the properties one woiild 
expect from such an approximation operator. 

(1) . A 2 j is a linear operator. If A 2 j f(x) is the approximation of some function 

f(x) at the resolution 2L then A 2 j f(x) is not modified if we approximate it again 

at the resolution 2L This principle shows that A 2 j * A 2 j = A 2 j -The operator 

A 2 j is thus a projection operator on a paiticulEir vector space V 2 j C L (R). The 

vector space V 2 j cam be interpreted ais the set of aiU possible approximations at 

2 

the resolution 2^ of functions in L (R). 

(2) . Among all approximated functions at the resolution 2-^, A 2 j f(x) is the 
function which is most similar to f(x). 

V g(x) e V 2 } , II g(xH(x) II > II A 2 j f(x) - f(x) 11 (1) 

Hence the operator A 2 j is an orthogonal projection on the vector space ¥ 2 ]- 

(3) . The approximation of a signal at a resolution 2}*^ contains all the necessary 

information to compute the same signal at a smaller resolution 2->. This is a 

causality property. Since A 2 j is a projection operator on V 2 } principle, is 

equivalent to 


V j 6 Z ,V2i C V2j^i. 


( 2 ) 
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(4) . An approximation operation is similar at all resolutions. The spaces of 

approximated functions should thus be derived from one another by scaling each 
approximated function by the ratio of their resolution values 

V j € Z , f(x) € V 2 j f(2x) € (3) 

(5) . The approximation A 2 j f(x) of a signal f(x) can be characterized by 2^ 

samples per length unit. When f(x) is translated by a length proportional to 2'^, 

A 2 j f(x) is translated by the same amount and is characterized by the same 
samples which have been translated . As a consequence of (3), it is sufficient to 
express the principle (5) for the resolution j=0. The mathematical translations 
consists of the following. 

Discrete Characterization: 

9 

There exists an isomorphism I from onto i (Z). (4) 

Translation of the approximation: 

V k e Z , Aif (x) = A.,f(x— k) , where f (x) = f(x— k). (5) 

i 1 If 

Translation of the samples : 

f(x)) = (a) .^2 ^ I(A^ f^(x)) = (a._^)ik6Z 

(6) . When computing an approximation of f{x) at resolution 2^, some mforraation 
about f(x) is lost. However, as the resolution increases to +oo the approximated 
signal should converge to the original signal. Conversely as the resolution decreases 
to zero, the approximated signal contains less and less information and converges 
to zero. 

Since tbe approximated signal at resolution 2^ is equal to the orthogonal 
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We call any set of vector spaces (V2j)j g 2 satisfies the properties 

( 2 )— (8) a multiresolution approximation of L^(R). The associated set of operators 
A2j satisfying ( 1 )— (6) give the approximation of any L^(R) function at a 
resolution 2^. 


2 

Theorem 1 ; Let (V2j)- ^ 2 multiresolution approximation of L (R). 



There exists a unique function (j>(x) € L (R), called a scaling function, such that 

if we set (|)2j(x)= 2 -’<{>( 2 jx) for j € Z ,(the dilation of (} 3 (x) by 2^ ), then 

1/2 4 > 2 i(^ “ 2 ^ n))n g 2 ^ orthonormal basis of ¥23. ( 9 ) 

This theorem shows that we can build an orthonormal basis of any ¥23 by 
dilating a function (j)(x) with a coefficient 2^ and traaisiating the resulting function 
on a gnd whose interval is proportional to 2~K The function ({)2i(^) normalized 
with respect to the L^R) norm. The coefficient appears in the bassis set in 

order to normalize the functions in L (R) norm. For a given multiresolution 
approximation (V2j)j g 2 > there exists a unique scaling function which satisfies 
( 9 ). However, for different multiresolution approximations, the scaling functions are 
different. In general, we want to have a smooth scaling function. The Fourier 
Transform of a continuously difierentiable and exponentially decreasing scaling 
function has the shape of a low— pass filter. 



The orthogonal projection on Y can now be computed by decomposing the 
signal f(x) on the orthonormal basis given by Theorem 1. The approximation of 
the signal f(x) at the resolution is thus characterized by the set of inner 

products which we denote by 

A^jf = ( < f(u) , <|)2j(u - 2-jn) > )n g z 

A2jf is called the discrete approximation of f(x) at the resolution 2K Since 
computers can only process discrete signals, we must work with discrete 
approximations. Each inner product can also be interpreted as a convolution 
product evaluated at a point 2'jn 

< f(u) , (|)2j(u - 2'jn) > - 

= 4>2j(^ - 2’^“) 

= f(u) * <|)2j(-u)) (2’jn) 

Hence we can rewrite A^jf 

Ajjf = (( f(u) » <tl2i(-u)) ( 2-jn))^ 5 2 

Since (|)(x) is a low— pass filter, this discrete signal can be interpreted as a 
low— pass filtering of f(x) followed by a uniform sampling at the rate 2K In an 
approximation operation, when removing the details of f(x) smaller than 2'J, we 
suppress the highest firequencies of this function. The scaling function 4 ^x) forms a 
very particular low— pass filter since the family of functions (>/2"'’ <j)2j(x — 2"Jn))^ ^ 
2 18 an orthonormal family. 

The discrete approximation of f(x) at the resolution 2 -^ can be computed 


with a pyramidal algorithm. 



2.3 IMPLEMENTATION OF A MULTIRESOLUTION TRANSFORM 


In practice, a physiceil measuring device can only measure a signal at a 

finite resolution. For normalization purposes, we suppose that this resolution is 
d 

equal to 1. Let A^f be the discrete approximation at the resolution 1 that is 

d 

measured. The causality principle says that from A^f we can compute ail the 

d , . 

discrete approximations A 2 jf for j < 0. In this section, we describe a simple 
iterative algorithm for calculating these discrete approximations. 

Let e Z ^ muitiresolution approximation and 4K^) 

oarresponding scaling function. The family of functions (j) 2 j+i(x — ^ 

2 is an orthonormal basis of V 2 j+i. We know that for any n G Z, the function 
42j(x — 2"jn) is a member of ^21 wfiich is included in V 2 j+i. It can thus be 
espanded in this orthonormal basis of V 2 j+i : 

(j)2j(x - 2-jn) = 

2'j-i < (j)2j(u — 2“'jn) , <j)2j+i(u — 2"j'^k) > 

“ 2'j'^k)) (12) 

% changing variables in the inner product integral, one can show that 
2-j-i < (i)2j(u - 2-%) , <i>2j+i(u - 2'j-ik) > 

= < «j> 2 -i(^) , - (k - 2n)) > 

When computing the inner products of f(x) with both sides of (12) we obtain 

< f(u) , (f) 2 j(u - 2-Jn) > 

~ ^ ~ - 2n)) > 

X < f(u) , ({) 2 j+i(u - 2'j’^k) > 



( 14 ) 


Let H be the discrete filter whose impulse response is given by 
V n 6 Z , h(n) = < (!)2-i(u) , (j)(u - n) > 


Let E be the mirror filter with impulse response h(n) = h(— n). By 
inserting ( 14 ) in the previous equation, we obtain 
< f(u) , - 2‘jn) > 

= h( 2 n — k) < f(u) , (j)2j+i(u — 2 "j‘^k) > ( 15 ) 

Equation ( 15 ) shows that A2jf can be computed by convolving A2j+if with 
H and keeping every other sample of the output. All the discrete approximations 

d . d 

A2j+if , for j< 0 , can thus be computed from Aj^f by repeating this process. 
This operation is called a pyramid trsmsform. 

In practice, the measuring device gives only a finite number of samples; 

4 , d • 

Aj^f = )l < n < N ■ discrete signal A2jf(j < 0 ) has 2% samples. In 

" . ~ ~ . . d 

order to avoid border problems while computing the discrete approximations A2jf, 

d 

we suppose that the original signal A^f is symmetric with respect to n = 0 and 
n = N. 

Of = Of if— N<n<0 

Ti -n 

= Qf-„ if 0 < n < N 

2N.n 


Theorem 1 shows that a multiresolution approximation 6 Z ^ 

completely characterized by the scaling function <j)(^)- A scaling function can be 
defined as a function (} 5 (x) 6 L^(R) such that, for ail j G Z, (v 2 “j <i^j(x — 2''jn)) 


n 


g 2 ^ orthonormal family, and if ¥23 is the vector space generated by tins 

2 

family of functions, then (V2j){ ^ 2 ^ multiresolution approximation of L (R). 

We also impose a regularity condition on the scaling functions. A scaling function 



(j>{x) must be continuovisly differentiable and the asymptotic decay of (|)(x) and 

(jf'(x) at infinity must satisfy 

l(Kx)l = 0(x-2) and \nx)\ = 0(x-2). 

The following theorem gives a practical characterization of the Fourier Transform 
of a scaling function. 

Theorem 2 : Let (j)(x) be a scaling function, amd let H be a discrete filter 

with impulse response h(n) = < ({) 2 -i(u) , (|)(u - n) >. Let H(w) be the Fourier 


series defined by 

H(w) = S“ h(n) e-j^ (16) 

Hfw) satisfies the following two properties : 

|H(0)1 = 1 and h(n) = 0(n‘^) at infinity. (16^) 

iH(a>)i^ + |H(w + 7r)l^ = 1 (16b) 

Conversely let H(w) be a Fourier series satisfying (16a) and (16b) and such that 

lH(w)j 0 for ui € [0 , x/2]. (16c) 

The function defined by 

^ 4"qo 

(Kw) = n H(2-Pa;) (17) 

p 

is the Fourier transform of a scaling function. 

The filters that satisfy property (16b) are called conjugate— filters. Given a 


conjugate filter H which satisfies (16a)— (16c), we can then compute the Fourier 
Transform of the corresponding scaling function with (15). It is possible to chose 
E(a;)in order to obtain a scaling fimction (j)(x) which has good localization 



properties in both the frequency and spatial domains. The smoothness class of <|>(x) 
and its asymptotic decay at infinity can be estimated fiom the properties of H(w). 

2.4 THE WAVELET REPRESENTATION 


As explained in the introduction, we wish to build a multiresolution 
representation based on the differences of information available at two successive 
resolutions 2^ and 2^ Such a representation can be computed by decomposing 
the signal using a wavelet orthonormal basis. 

The Detail Signal. The difference of information between the 
approximation of a function f(x) at the resolutions 2 ^ and is called the 

detail signal at the resolution 2 K The approximation of the signal at the resolution 
2^ and 2^"^^ axe respectively equal to its orthogonal projection on and V2j+i- 

By applying the projection theorem, we can easily show that the detail signal at 
the resolution 2^ is given by the orthogoned projection of the original signal on the 
orthogonal complement of V2j in ^23+1- Let 02j be this orthogonal complement, 
i.e.. 

02j is orthogonal to ¥23 
O23 © ¥*23 = V23+1. 

To compute the orthogonal projection of a function f(x) on ©23, we need to find 
an orthogonal basis of O23. Much like Theorem 1 , Theorem 3 shows that such a 
basis can be built by scaling and translating a function ^(x). 




Theorem 3 ; Let (V 2 j)j ^ g ^ multiresolution vector space sequence, (j5(x) 
the scaling function, and H the corresponding conjugate filter, Let ?(x) be a 
function whose Fourier Transform is given by 

i(a;) = G(a;/2) i(a;/2) 

with G(w) = e-’‘^ H( w + w) (18) 

Let = 2-* f(2jx) denote the dilation of ^(x) by 2K Then (v^‘j ^ 9 j(x — 

2‘jn))^ g 2 ^ ^ orthonormal basis of 0,jj and 

(V2*j j) G ^ orthonormal basis of L^(R). 

^(x) is called an orthogonal wavelet 

An orthonormal basis of 02j can thxis be computed by scaling the wavelet 
f(x) with a coefficient 2^ and treinslating it on a grid whose interval is 

proportional to 2'K For computing a wavelet, we can define a fimction H{w) which 
satisfies the conditions (16a)— (16c) of Theorem 2, compute the corresponding 
scaling function ({)(x) with equation (17) and the wavelet ^(x) with (18). 

Depending upon choice of H(w), the seeding function (j)(x) and the wavelet ^(x) 
can have good localization both in the spatial and Fourier domains. Daubechies [9] 
studied the properties of 4)(x) emd ^(x) depending upon H(a/). She shows that for 

any n > 0, we can find a function H(a/) such that the corresponding wavelet 

f(x) has a compact support and is n times continuously differentiable. 

The decomposition of a signal in ein orthonormed wavelet basis gives an 
intermediate representation between Fourier and spatial representations. Due to this 
double localization in the Fourier and the spatial domains, it is possible to 
chciraicterize the local regularity of a function f(x) based on the coefficients in a 
wavelet orthonormal basis expansion [25]. 



be the orthogonal projection on the vector space ©23 • As a 

consequence of Theorem 3 , this operator can now be written 
P JW = 2 -i < f(u) . ®2j(u - 2 -in) > 

X ^2j(x - 2 ‘-jn). ( 19 ) 

P„ f(x) yields to the detadl signal of f(x) at the resolution 2 K It is characterized 

^2J 

by the set of inner products 

D2jf = (< f(u) , ^2j(^ - 2-jn) > ^ 2, 

D2jf is called the discrete detail signal at the resolution 2 K It contains the 

difference of information between A^j+i f and A^jf. As we did in (11), we can 

prove that each of these inner products is equal to the convolution of f(x) with 

2“% 

< f(u) , «2j(u - 2-in) > = ( f(u) * 'tjjC-u)) ( 2-in) (21) 

Equations ( 20 ) and ( 21 ) show that the discrete detail signal at the resolution 2 ^ is 

equal to a maiform sampling of ( f(u) * ^2j('“^)) (x) at the rate 2^ 

Djif = (( fW * «2i(-^)) ( 2-in))^ j 2 

The wavelet ^(x) can be viewed as a band pass filter whose frequency bands are 
approximately equal to [ — 27 r, — tt ] U [ x, 2 x ]. Hence the detail signal D2jf 

describes f(x) in the frequency bands [ — ] U [ 2 '^!:, 2 '^'^^t ]. 

We can prove by induction that for any J > 0 , the original discrete signal 

d . . 

Aj^f measured at the resolution 1 is represented by 

(A2-J f , (D2jf )_j < j < -i' 

This set of discrete signals is called an orthogonal wavelet representation, and 

consists of the reference signal at a coarse resolution A2— J f aud the detail signals 

at the resolutions 2 ^ for — J < j < — 1 . It can be interpreted as a decomposition 



of the original signal in an orthonormal wavelet basis or as a decomposition of the 
signal in a set of independent frequency channels. This independence is due to the 
orthogonality of the wavelet functions. 

In anedogy with the Laplacian pyramid data structxire, A^— J f provides the 
top - level Gaussian pyramid data, and the D 2 jf provide the successive Laplacian 
pyrsimid levels. Unlike the Laplacian pyramid, however, there is no over sampling, 
and the individual coefficients in the set of data axe independent. 

; 

2.5 IMPLEMENTATION OF AN ORTHOGONAL WAVELET 
REPRESENTATION 


Here, we describe a pyramidal algorithm to compute the wavelet 

d 

representation. We show that D 2 j f can be calculated by convolving A 2 j+i f with 
a discrete filter G whose form we will characterize. 

For any n € Z, the function ^ of ^2^ ^ 

In the same manner as (12), this function can be expanded in an orthonormal 
basis of V 2 j+i. 

%(x - 2-->n) = 

2‘j'^ £” < ~ 2‘jn) , <{)2j+i(u - 2'j‘^k) > 

X (})2j+i(x - 2"'>‘^k)) (23) 

As we did in (13), by changing variables in the inner product integral we can 
prove that 


2‘j‘^ < 4>2j(u - 2‘jn) , <j)2j+i(u — 2‘j’^k) > 

= < ^ 2 -i(^) > - (k - 2n)) > 


( 24 ) 



Hence, by computing the mner product of f(x) with the functions of both sides of 
(24), we obtain 

< f(u) , ^2j(^ “ 2’jn) > 

= f < fj-iCu) , 4<u - (k - 2n)) > 

X < f(u) , 4>2j+i(^ “ > (25) 

Let G be the discrete filter whose impulse response is given by 

V n e Z , g(n) = < n) > (26) 

and let G be the symmetric filter with impulse response g(n) = g(~n.). The 
transfer function of this filter is the function G(w) defined in Theorem 3, equation 
(18). By inserting (27) in (26), we obtain 
< ^ 

= g(2n - k) < f(u) , (tJ 2 j+i(u - 2‘j'^k) > (27) 

Equation (27) shows that we can compute the detail signal D 2 jf by convolving 

d 

A^j+if with the filter G and keeping every other sample of the output. The 

d 

orthogonal wavelet representation of a discrete signal A^f can therefore be 

d . d 

computed by successively decomposing A 2 j+if into A 2 jf and D 2 jf for — J < j < — 1. 
This algorithm is illustrated by the block diagram shown in Figure 5. In praw:tice 
the signal Aj^f has only a finite number of samples. One method of handling the 
border problems uses a symmetry with respect to the first and last samples. 

Equation (18) of Theorem 3 can be used to show that the impulse response 
of the filter G is related to the impulse response of the filter H by 

g(n) = (—1)^ ^ h(l— n) ' (28) 

G is the mirror filter of H and is a high— pass filter. G and H axe called 



quadrature mirror filters. Equation (27) can be interpreted as a high-pass filtering 

ci 

of the discrete signal A2j+if. 

If the original signal has N samples, then the discrete signals D 2 jf and A^jf 
have 2JN samples each. Thus the wavelet representation 

(A 2 -J f , (D2jf)_j < j < _i ) 

has the same total number of samples as the original approximated signal A^^f. 
This occmrs because the representation is orthogonal. The energy of the samples of 
D 2 jf gives a measure of the irregularity of the signal at the resolution 2^"^^ 
Whenever A 2 jf(x) and A 2 j+if(x) are significantly different, the signed detail heis a 
high amplitude. 

2.6 SIGNAL RECONSTRUCTION FROM AN 

ORTHOGONAL WAVELET REPRESENTATION 

Here we show that the original discrete signal can also be reconstructed 
with a pyramid transform. Since 02j is the orthogonal complement of V 2 j in 

Voj + l, 

i v2'^ 02 j(x - 2'jn), v*2’^ ^2^^^ ~ 2’%))^ ^ 2 is an 
orthonormal basis of V 2 j+i- For emy n > 0, the function (}) 2 j+i(x — 2'i‘bi) cein 
thus be decomposed in this basis 
<|>2j+i(x - 2‘j‘*n) 

= 2'i < (})2j(u - 2‘ik) , (i)2j+i(u - 2'j"ba) > 

X 02j(x — 2"ik)) 

+ 2-j t" < ®jj(u - 2-4) , (t)2i+i(u - 2-i-»n) > 

?2i(x - 2-4)) 


X 


(29) 



By computing the inner product of each side of equation (29) with the function 
f(x), we have 

< f(u) , (}) 2 j+i(u - 2‘i-hi) > 

— 2 ^ ^cjo ^ — 2 jk) , (j)2j+i(u — 2'j"^) > 

X < f(u) , ([) 2 j(u - 2'jk) > 

+ 2-i t" < «- 2 j(u - 2-ik) , (() 2 j+i(u - 2-i->n) > 

X < f(u) . 9'2j(u - 2-ik) > (30) 

Inserting (13) and (24) in this expression and using the filters H and G 

respectively, defined by (14) and (26) yields 

< f(u) , c|)2j+i(u - 2‘-i'in) > 

= 2 il” h(n - 2k) < f(u) , (|) 2 j(u - 2‘jk) > 

+ 2 g(n - 2k) < f(u) , '? 2 j('i - 2'jk) > (31) 

<1 

This equation shows that A 2 j+if can be reconstructed by putting zeros between 
d 

each sample of A 2 jf and D 2 jf emd convolving the resulting signals with the filters 

d 

H and G respectively. The original discrete signal Aj^f at the resolution 1 is 

reconstructed by repeating this procedure for — J < j < 0. 

2.7 EXTENSION OF THE ORTHOGONAL WAVELET 
REPRESENTATION TO IMAGES 


The Wavelet model can be easily generalized to any dimension n > 0 as 

l 

shown by the work of Y.Meyer. Here, we are interested in the two-dimensional 
case of image processing applications. The signal is now a finite energy function 
f(x,y) € L^(R^). A multiresolution approximation of L^(R^) is a sequence of 
subspaces L^(R^) which satisfies a straightforward two-dimensional extension of the 
properties (2)— (8). Let (V 2 j)j g 2 ^ multiresolution approximation of 
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L (R ) is a sequence of subspaces L^(R^) which satisfies a straightforward 

two-dimensional extension of the properties ( 2 )-( 8 ). Let (V.,;). „ be such a 

, 0 0 2 j € Z 

multuesolution approximation of L (R^). The approximation of a signal f(x,y) at a 

resolution 2 *^ is equal to its orthogonal projection on the vector space ^ 2y Theorem 1 is 

stiH valid in two dimensions and it can be shown that there exists a unique scaling function 

t|)(Xjy) whose dilation and translation gives an orthonormal basis of each space ^2}' bet 

~ 2^^(j)(2'^x,2^y). The family of functions 

(2-i(tl2j(x-2-jn,y-2-im))(^^^jg22 

fonns an orthonormal basis of V 2 j. The factor 2‘i normalizes each function in the L^CR^) 

norm. The function <{)(x,y) is unique with respect to a particular multiresoiution 

2 2 

approximation of L (R ). 

We consider the particular case of separable multiresolution approximations of 

L‘'(R ). For such multiresolution approximations, each vector space can be 

2 

decomposed as a tensor product of two identical subspaces (V^j) of L (R). The sequence of 

2 2 - 

vector spaces (V2j)j g 2 ^ multiresolution approximation of L (R ), if and only if 

2 

Vi^j is a multiresolution approximation of L (R). One can thus easily show that the scaling 
function ({)(x,y) can be written as 

4 >(x,y) = <{)(x) <j)(y) 

where (j)(x) is the one— dimensional scaling function of the multiresoiution approximation 
(V2j)j g 2 • With a separable multiresolution approximation, extra importance is given to 
the horizontal and vertical directions in the image. F or many types of images, such as those 
from man-made environments, this emphasis is appropriate. The orthogonal basis of ¥23 is 
then given by 

(2’j (j)2j(x — 2'jn, y - ^ 2^ 

= (S'-i <l)2j(x - 2-^n) (l)2j(y - g 2^ 


( 32 ) 



The approximation of a signal f(x,y) at a resolution 2 ^ b therefore characterized 
by a set of iimer products 
A^-f - 

.( < f(x,y) , (j) 2 j(x - 2-in) (!> 2 j(y - 2-im) > ^ ^2 

Assuming that the camera measures an approximation of the irradiance of a scene 

<d 

at the resolution 1, let A^f be the resulting image and N be the number of 

d 

pixels. One can easily show that for j < 0, a discrete approximation A 2 jf has 2iN 
pixels. Border problems are hrmdled by supposing that the original image is 
symmetric with respect to the horizontal and vertical borders. 

As in the one— dimensional case, the detail signal at the resolution 2^ b 
equal to the orthogonal projection of the signal on the orthogonal complement of 
V 2 j in V 2 j+i. Let 02j be this orthogonal complement. The following Theorem 
gives a simple extension of Theorem 3, and states that we can build an 
orthonormal basis of © 2 ] by scaling and translating three wavelet functions, 
^\x,y), '?^(x,y) and '?^(x,y). 

Theorem 4 ; Let fV 2 j)- ^ ^ ^ separable multiresolution approximation of 

_ « 

L"'(R ). Let 4>(x,y) = 4)(x) (j)(y) be the associated two-dimensional scaling fimction. 
Let ^(x) be the one— dimensional wavelet associated with the scaling function <|>(x). 
Then, the three "wavelets” 

^\x,y) = (Kx)^(y) , 

’?^(x,y) = ’P(x)<|)(y) , 

'?^(x,y) = ?(x)^^(y). 

are such that 

(2-'> ?^j(x - 2"->n, y - 2-^'m), 

2--' - 2-jn, y - 2-jm), 

2-i t*i(x - 2-in, y - 2-im))(^,^) ^ ^2 


(33) 



is an orthonormal basis of 02j and 

(2'j “ 2"jn, y - 2'jm), 

2 -j - 2 -jn, y - 2 -jm), 

2 -^ t’Kx - 2 -in. y - 2 -in>))(^ , ( 3 ^) 

2 2 

is an orthonormal basis of L (R ). 

The difference of information between A^j+if and A^jf is equal to the 
orthonormal projection of f(x) on ©23 , and is characterized by the inner products 
of f(x) with each vector of 2ui orthonormal basis of ©23 • Theorem 4 says that this 
difference of information is given by the three detail images 

D^3f = ( < f(x,y) , ^^3(x - 2-jn, y - 2-jm) > ^ 2^ ( 35 ) 

= ( < f(x,y) , ^2j(^ “ y “ > )(n,m) € 

D|3f = ( < f(x,y) , ^|3(x - 2 -jn, y - 2 -jm) > ^ ^2 ( 37 ) 

Just as for one— dimensional signals, one can show that in two dimensions the 

inner products which define A23f,D23f, D23f and D23f axe equal to a uniform 
szimpling of two-dimensional convolution products. Since the three wavelets 
^ (x,y), (x,y) and ^ (x,y) axe given by separable products of the functions (p 

and these convolutions can be written 
A^jf = 

(( f(x,y) * <ti 2 i(-x)<t) 2 j(-y))( 2 --in , ^ ( 38 ) 

D>jf = 

(( f(x,y) * ii)2j(-x)4'2j(-y))( 6 7? 

D|j£ = 

(( f(x,y) ♦ f2i(-x)<t^j(-y))( 2-in , 2-im))(^,^) 5 ^2 ■ («]) 

D|if = 

(( £(x,y) • '!' 2 i(-=')* 2 i(-y))( 2''” ' € 7? 


( 41 ) 
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The expressions (38) through (41) show that in two dimensions, A^jf and the D^jf 
are computed with separable filtering of the signal along the abscissa and ordinate. 

The wavelet decomposition can thus be interpreted as a signal decomposition 
in a set of independent, spatially oriented channels. Let us suppose that 4)(x) Jind 

t(x) are, respectively, a perfect low-pass and a perfect band pass filter. Figure 

di 

2.3 shows in the fiequency domain how the image A 2 j+if is decomposed into A 2 jf, 
D^jf, and D^jf. The image A 2 jf corresponds to the lowest frequencies, D^jf 
gives the vertical high frequencies (horizontal edges), D|jf the horizontal high 
frequencies (vertical edges) and D^jf the high frequencies in both directions (the 
corners). The arrangement of the D^jf images is shown in Figure 10(b). 

For any J > 0, an image A^f is completely represented by the (3J+1) 
discrete images (A^-J f, (D^jf)_j < j < (D^j^-J < j < _l < j < 

). This set of images is called an orthogonal wavelet representation in two 

d . . . _,j 

dimensions. The image A 2 -J f is the coarse approximation at the resolution 2 

and the D^jf images give the detail signeds for different orientations and 

resolutions. If the original image has N pixels, each image A 2 jf, D^jf, D^jf, Dijf 

has 2^N pixels (j < 0). Tne total number of pixels in this new representation is 

equal to the number of pixels of the original image, so we do not increase the 

volume of data. Once again, this occurs due to the orthogonality of the 

representation. In a correlated multiresolution representation such as the Laplacian 

pyramid, the total number of pixels representing the signal is increased by a 

factor of 2 in one dimension and of 4/3 in two dimensions. 
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2.8 DECOMPOSITION AND RECONSTRU CTION ALGORITHMS 
IN TWO DIMENSIONS 


In two dimensions, the wavelet representation can be computed with a 

pyramidal algorithm similar to the one— dimensional algorithm. The two dimensional 

wavelet transform can be seen as a one-dimensional wavelet transform along the x 

and y axes. It can be shown that a two-dimensional wavelet transform can be 

computed with a separable extension of the one— dimensional decomposition 

algorithm. At each step we decompose A^j+if into A^jf , D^jf .D^jf ,and D^jf. 

This algorithm is illustrated by the block diagram in Figure 2.1. We first convolve 
d 

the rows of A 2 j+if with a one— dimensional filter, retain every other row, convolve 
the columns of the resulting signals with another one-dimensional filter and retain 
every other column. The filters used in this decomposition axe the quadrature 
mirror filters H and G. 

The structure of application of the filters is given in the Figure 2.1. We 
compute the wavelet transform of an image Ap by repeating this process for —1 > 
j > — J. This corresponds to a separable conjugate mirror filter decomposition. The 
wavelet coefficients have a high amplitude around the images edges and in the 
textured areas within a given spatial orientation. 

The one— dimensional reconstruction algorithm can also be extended to two 
dimensions. At each step, the image A^j+if is reconstructed from A^jf 
,D^jf,and D^f. This algorithm is illustrated in Figure 2.2. Between each column of 
the images A^jf .D^jf .D^jf.and D^jf, we add a column of zeros, convolve the 
rows with one-<limeiisional filter, add a row of zeros between each row of the 
resisting image, and convolve the columns with another one-dimensional filter. The 
filters used in reconstruction are the quadrature mirror filters H and G. The 
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image A^f is reconstructed from its wavelet transform by repeating this process for 
— J < j 1 ~1- If we use floating-point precision for the discrete signals in the 
wavelet representation, the reconstruction is of excellent quality Figure 2.2 shows 
the reconstruction of the original image from its wavelet representation. 
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COLUM M5 



d • d 1 

Fig. 2.1. Decomposition of an image A 2 j+if into the coarse signal A 2 jf , D 2 jf , 
D'^jf , and D^jf-This algorithm is based on one— dimensional convolutions of the 
rows and columns of A^j+if with the one-dimensional quadrature mirror filters H 
and G. 
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HT]- lrj4. d 1 2 3 

Fig. 2.2. Reconstruction of an image A 2 j+if from A 2 jf , ^2^^ ’ ^2-'^ ’ ^2^^' 

The rows and columns of these images are convolved with the one-dimensional 

quadrature mirror filters H and G. 
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Fig. 2.3 (a). Decomposition of the frequency support of the image A^j+i f iiifo 
A^jf and the detail images D^jf. 



2.3(b). Disposition of the ^ 2 ^^ images , of the wavelet 






CHAPTER 3 


TRANSPUTER AND OCCAM 

3.1. TRANSPUTER 


3.1.1. Introduction: 

The word Transputer may be interpreted as a contraction of the wore 
Tranceiver and computer. The interpretation suggests that the Transputer consist 
of a communication system and a computational element. 

"Transputer" can be used as a basic element in multiprocessor systems. Th 
chip, being very versatile, has received considerable attention and popxdarity stmon 
concurrent system designers recently. 

There are a number of interesting features of the Transputers whic. 
distinguish the Transputer from any other processor. Of all, the integration of th 
design and the advantage of having many features built into a single chip axe th 
important aspects, which result in the enhancement of processor speed, low chi] 
count, and simplicity of the system design. 

A Transputer can be used in a single processor system or in networks ti 
build high performance concurrent systems. A typical member of transputer produc 
family is a single chip containing processor, memory and point— to— poin' 
communication link s. A network of Transputers can be easily constructed usinj 
these links. As a microcomputer, the Transputer is unusual in its ability t( 
comm'unicate with other Transputers. A variety of different configurations can b< 
built by hard— wiring Transputers together, with no separate switching ant 
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forwaxding network, limited only by the number of links provided on eacl 
Transputer. The current Transputer systems have four to six links. Four links ar( 
enough to allow enormous range of useful configurations. 

Scalability is a major advantage of treinsputer— based systems; it is muci 
easier to enhance a system by adding further processors to it than would be th« 
case with other microprocessors. Compatibility — the ease of replacing on( 
transputer model with another without major design changes in a system is alsc 
valuable. This extends to mixing models of transputer within one system. 

3.1.2. Architectural details : 

Transputers are often described as RISC processors. The computationa 
instructions follow RISC principles closely, and they attain the benefits claimed foi 
RISC eirchitecture. However, they also have a small number of ‘ importani 
non— RISC instructions concerned with scheduling and message passing. 

The Transputer is unusual in its ability to execute many software processes 
at the same time. A program czm be run on a single Transputer, in which case 
the concurrency of the processes will be simulated by ‘hardwaire with no softwars 
mtervention. Provided the communication between the subprocesses is not toe 
complicated, the same program can also be distributed over several processors, ii 
which case the component processes will be run in real concurrency. Just as in a 
single processor, interprocess messsige passing and the necessary synchionizatior 
(between directly cormected Trsinsputers) axe achieved in hardware, md nc 
operating system is needed. 

The inbuilt features of the Transputer make up a long list. As well as th€ 
instruction processor and small amount of on chip memory, it nas a memory 
controller, DMA control for four independent fast links, a microcoded multitasking 
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kernel, and an elapsed time clock (Fig. 3.1). The best known Transputers are th€ 
T212, T414 and T800. The T212 is a 16— bit processor, the other two are 32— bit 
processors. The T800 has a full IEEE floating-point processor on chip. 

3. 1.2.1. Instruction processor : 

The processor portion of a Transputer is a traditional microprocessor. Fig 

3.2 shows the 32— bit version of the Transputer. The processor normally obtains its 
instructions and data from the internal 4K RAM .Data and instructions can also 
be obtained from the links. The processor provides 32— bit addressing, memory is 
addressed byte— wise and stored in 4— byte units. Software sees no difference 
between on— chip and external memory except in speed. 

The design of Transputer processor exploits the availability of fast on— chip 
memory by having only a small number of registers; the CPU contains six 

registers which axe used in the execution of a sequential process. The small 

number of registers, together with the simplicity of the instruction set enables the 

processor to have relatively simple (and fast) data— paths and control logic. 

The instructions refer to the stack imphcitly. The use of a stack removes 
the need for instructions to respecify the location of their operands. Statistics 
gathered from a large number of programs show that three registers provide an 
effective balance between code compactness and implementation complexity. 

3. 1.2.1. InstructionSet: 

Transputer instruction set is designed for simple and efficient compilation. 
All the instructions have the same format and chosen to give a compact 
representation of the operations most frequently occurring in programs. The 
instruction size of the Transputer is only 8 bits for most instructions. There axe 
prefix instructions which allow the operand to be extended to any length. 



Measuieroeiits show that about 70% of the executed instructions are encoded in a 
single byte. There is an extra word of prefetch buffer, so the processor rarely has 
to wait for an instruction fetch before proceeding. 

3. 1.2.3. Memory controller : 

The memory controller can drive external dynamic RAM with no additional 
circuitry. Together with the controller, the processor can address a linear ad.dress 
space of 4r-Gbytes. The 32— bit wide memory interf 2 u:e uses multiplexed data and 
address lines and provides a data rate of up to 4 bytes every 100 nanoseconds (40 
Mbyt^/sec) for a 30 MHz device. The configurable memory controller provides all 
timing, control and DRAM refresh signak for a wide variety of mixed memory 
systems. 

3. 1.2.4. Process scheduler ; 

The processor provides efficient support for concurrency and commum cation. 
It has a microcoded scheduler which enables any number of concurrent processes to 
be executed together, sharing the processor time. This removes the need for a 
software kernel. The processor does not need to support the dynamic allocation of 
the storage as the compiler is able to perform the allocation of space to the 
concurrent processes. 

A process starts, performs a number of actions, and then terminates. 
Typically a process is a sequence of instructions. A transputer can run several 
processes in parallel (concurrently). Processes may be assigned either high or low 
priority. At any time, a concurrent process may be 

active — being executed 

— on a list waiting to be executed 
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isactive — ready to input 

- ready to output 

- waiting until a specified time 

Tile scheduler operates in such a way that inactive processes do not 

consume any processor time. The active processes waiting to be executed axe held 

on a list. The IMS TSOO supports two levels of priority. Priority 1 (low priority) 

processes axe executed whenever there axe no active priority 0 (high priority) 

processes. 

High priority processes axe expected to execute for a short time. If one or 

more high priority processes axe able to proceed, then one' is selected and run 

until it has to wait for communication, a timer input, or until it completes 
processing. 

Each process runs until it has completed its action, but is descheduled 
whilst waiting for communication from another process or Transputer, or for a 
time delay to complete. In order for several processes to operate in parallel, a low 
priority process is only permitted to run for a maximum of two time slices before 
It is forcibly descheduled at the next descheduling point. 

A process can only be descheduled on certain instructions, known as 

descheduiing points. As a result, an expression evaluation can be guaranteed to 
execute without the process being time sliced part way through. Actual proce^ 

switch times Eire less than 1 fjs, as little state needs to be saved and it is not 
necessary to save the evaluation steick on rescheduling. 

3 . 1 . 2 . 5 . Commun i c at i on s : 

Communication between the processes is achieved by means of channels. The 
communication is pomt— to— point, synchronized and unbuffered. As a result, a 
chaxmei needs no process queue, no message queue and no message buffer. 
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A channel between two processes executing on the same Transputer is 

implemented by a single word in memory; a channel between processes executing 
on different Transputers is implemented by point— to— point links. The processor 
provides a number of operations to support message passing, the most important 
being 

input message output message 

The input message and output message instructions use the address of the channel 
to determine whether the channel is internal or external. This means that the 
same instruction sequence can be used for both for hard and soft chaaoneis, 

allowing a process to be written and compiled without the knowledge of where its 
channels are connected. 

The communication takes place when both the inputting and outputting 
processes axe ready. Consequently, the process which first becomes ready must wait 
until the second one is also ready. 

A process performs an input or output by loading the evaluation steick with 
a pointer to a message, the address of a channel, and a coimt of the number of 

bytes to be transferred, and then executing an input message or an output 

message instruction. 

3. 1.2. 5.1. Internal channel communication : 

At any time, an internal channel (a single word in memory) either holds 
the identity of a process, or holds the special value empty. The channel is 
initialized to empty before it is used. 

when a message is passed using the channel, the identity of the first process 
to become ready is stored in the channel, and the process staxts to execute the 
next process from the scheduling list. When the second process to use the channel 
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becomes ready, the message is copied, the waiting process is added to the 
scheduling hst, and the channel reset to its initial state. It does not matter 
whether the inputting or the outputting process becomes ready first. 

3. 1.2. 5. 2. External channel communication : 

When a messaige is passed via an external channel the processor delegate to 
an autonomous hnk interface the job of transferring the messeige and deschedules 
the process. When the message heis been transferred, the link interface causes the 
processor to reschedule the waiting process. This allows the processor to continue 
the execution of the other processes whilst the external messsige transfer is taking 
place. 

3. 1.2.6. Communication links : 

A link between two transputers is implemented by connecting a link 
interface on one Transputer to a link interface on the other Transputer by two 
one— directional signal wires, along which data is transmitted serially. The two 
wires provide two channels, one in each direction. This requires a simple protocol " 
to multiplex data and control information. Messages axe transnoitted as a sequence 
of bytes, each of which are to be acknowledged before the next is transmitted. A 
byte of data is transmitted as a start bit followed by a one bit followed by eight 
bits of data followed by a stop bit. An acknowledgement is transmitted as a start 
bit followed by a stop bit. An acknowledgement indicates that a process was able 
to receive the data byte and that it is able to buffer another byte. 

The protocol permits an acknowledgement to be generated as soon as the 
receb/er has identified a data packet. In this way, the acknowledgement can be 
-v..ved by the transmitter before ail of the data packets have been transmitted 



and the transmitter can transmit the next data packet immediately. The IMS 

T414 transputer does not implement this overlapping and achieves a data rate of 
0.8 Mbytes per second using a link to transfer in one direction. However, by 
implementing sufficient overlapping and including sufficient buffering in the link 
hardware, the IMS TSOO more than doubles this data rate to 0.8 Mbytes per . 
second in one direction, and achieves 2.4 Mbytes per second when the link carries 
data in both directions. 

3. 1.2.7. The Timer : 

The Transputers have two timer clocks which ’tick* periodically. The timer 

provide accurate process timing, allowing processes to be descheduled until a 
specific time. 

In the IMS TSOO Transputer, one timer is accessible to only high priority 
processes and is incremented every micro second, cycling completely in 
approximately 4295 milliseconds. The other is accessible only to a low priority 
process and is incremented every 64 microseconds, giving exactly 15625 ticks in 
one second. It has a fui] period of approximately 76 hours. 

3. 1.2.8. The Floating point processor : 

The IMS TSOO has a full IEEE floating point processor on chip. The FPU 
operates concurrently with CPU. This means that it is possible to do address 

calculation in the CPU whilst the FPU performs the floating point calculation. 
This can lead to sienificant performance improvements in real applications which 
access arrays heavily. Performance depends on many things, including clock and 
memory speeds — for the 20 MHz TSOO these figures are of the order of 10 
.0 MIPS and 1 MFLOPS (these are not upper bounds). 
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As can be noticed from above, all the links can be active at the same time 

as well as the processor. Thus the Transputer can support nine trulv concurrent 

activities, (one link can transfer data in both directions, of course, memory 

accesses have to be interleaved). On a T 800 , the floating point processor operate 

in parallel with the instruction processor, which gives a tenth level of concurrency 

at the hardware level (but both the processors are controlled by a single 

instruction stream). 

/ 

3.2. THE OCCAM 


Occam's Razor — Entities are not to be multiplied beyond necessity was the 
philosophy of the original implementation of Occam. 

Occam is the native language for Transputers. The design of Occam has 
been heavily influenced by the work on Communicating Sequential Processes (CSP), 
which gives a mathematical fraime work for specifying the behaviour of parallel 
processes. Occam is based on the CSP model of computation, but with features 
chosen to ensure efficiency of implementation. In this model, an application is 
decomposed into a collection of communicating processes, and the processes 
communicate by passing messages. 

Occam model is based on the idea of process. The software bui l ding block 
is a process. A system is designed in terms of an interconnected set of processes. 
Each •process cab be regarded as an independent unit of design. Its internal design 
is hidden and • is completely specified by the message it sends and receives. 
Internally, each process can be designed as a set of communicating processes. The 
wstem design is therefore hierarchically structured. Occam processes do nor share 
any variables, nor semaphores. Occam does not require (nor support) shared 



memory. Messages pass from exactly one process to the other. There are no 
multiple senders or receivers, no broadcasting, and no uncertainty about where a 
message came from or where it is going. Messages are unbuffered, so sending and 
receiving a message involves momentary synchronization between the two 
participating processes. Messages are sent through static channels, as if through a 
circtht switched (rather than packet switched) network. 

To gain the most benefit from the Transputer architecture, the whole 

system can be programmed in Occam. This provides all the advantages of a high 

level Iringuage, the maximum program efficiency and the ability to use the special 

features of the Transputer. The Occam model of concurrency is applicable equally 
to processes r\mning on separate processors and to processes running within a 
single processor. Since the processor can be controlled only by one instruction 
stream, it is evident that the processes in ' one processor canrwt be truly 
concurrent However, the processes can be multi programmed, just as on a 

mainframe, so that the effect of concurrency is reproduced (apart from speed). 
This ’simulated concurrency' as distinct from 'real concurrency' is known as 
^graiuiious concurrency'. Within a Transputer, this multiprogramming is handled by 
hardware with no need for any operating system. 

Occam provides a framework for designing concurrent systems using 
Transputers just in the same way that boolean algebra provides a framework for 
designing electronic systems from logic gates. The system designer's task is eased 
because of the architectural relationship between Occam eind Transputer. A 
program running in a Transputer is formzdly equivalent to an Occam process, and 
so a network of Transputers can be described directly as an Occam program. 

Occam, when compiled for execution on a Transputer, is ideal for embedded 
multiprocessor systems. Where it is required to exploit concurrency, but still to 
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activities. This is a useful abstraction. One would like to be able to express an 
algorithm in the form of a program which is independent of hardware, so that it 
could be subsequently be performed using many different networks of processors. 
Each implementation would need a specification of how many processors were 
needed, how they were to be connected, and which processes were to be installed 
on which processors, but the specification ought not to need any change in the 
progTcim. The specification is called placement 

In practice, Transputer programs are not completely independent of their 
placement Unless it is carefully designed, a program will only run on one 
particular network of Transputers, or on a small number of similar networks. To 
run on other networks, the program itself will have to be changed. Sometimes the 
changes will be minimal, but other cases may need extensive modifications. 
Programmers therefore have to make a conscious, effort to write programa whicii 
can easily be run on different configurations. 

3.3.3. Non — determinacy : 

Sequential programmers are used to the idea of a bug. There are solid bugs 
and intermittent bugs. Intermittent bugs are data dependent; a program rim on 
the same inputs will work every time or it fsiils every time. Concurrent 
programming has a third type of bug; the bug that depends on the relative timing 
of concurrent processes. These are very often not repeatable, even if the program 
is rerun on the same data. 

The problem arises because all the processors are allowed to run at their 
own speed. There is no attempt to constrain the processors into a loci step. Thus 
the order of events can change from one test run to another.lt is obviously 
important at the design phase not to assume an exact ordering of the events. 
Occam is as precise as any other language about what individual processes do, but 
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it cannot specify the relative timing of concurrent processes (otherwise the whole 
point of concurrency would be lost). 

It is the progranumer’s responsibility to ensure that when a program 
terminates it has completed the required function, regardless of the order in which 
things happened between starting and termination. Since Occam does not guarantee 
determinacy , it has been designed to express and handle non— determinacy very 
simply, flexibly, and elegantly. 

3.3.4. Deadlock ; 

A classic problem in concurrent systems is deadlock. This is an affliction 
whereby one part of a program is waiting for another to do something; the other 
is waiting for the first to do something else; and since both axe waiting, neither 
can do what other expects. There may, of course, be a set of processes involved, 
rather them a pair. 

All Transputer deadlocks are essentially the same in that they involve a 
closed chain of processes, each trying to communicate with another, but with no 
pair of them willing to participating in any one communication. There is an 
enormous variety of ways which may lead to deadlock, and that makes it hard to 
avoid. It is also difficult to analyze and hard to cure after it is found to occur in 
a program. Formal methods used are writing only simple code, supported by 
intuition and back— of— envelope sketches. 

3.3.5. Concurrent algorithm design considerations : 

Most of the concurrent algorithms are only loosely related to their sequential 

equivalents. The conversion of a sequential algorithm into concurrent algorithm, 
often called 'paraUelizaiion' is too ill defined and difficult to be automated and is 
OiCen attempted by hand. The following points should be given due consideration 
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while developing the concurrent algorithms. 

3. 3. 5.1. Granularity : 

The term * granidarii^ , which is jargon much used among concurrent 
programmers, refers to the size of the task that are distributed as concurrent 
processes. In a matrix multiplication, zdl the elements of the result can, in 
principle, be evaluated concurrently, because none of the calculations depends on 
the result of any other. That would be fine grain concurrency. A more coarse 
grain division of labour could evaluate one quarter of the result, working 
sequentially on one processor, while evaluating the other three quarters on three 
other processors. 

Intuitively, one would expect fine-grained concurrency to deliver results 
faster but to use more processors. With the Transputers, a finer— grain division of 
work requires more information to be passed between processors. Reducing the 
grain size below some optimum value for a particular problem can incur message 
passing costs which grossly outweigh the expected benefits. 

Granularity is a particuleu problem in the conversion of existing sequential 
code into concurrent methods. It is relatively easy to recognize fine-grain potential 
parallelism, e.g.several assignments whose order is immaterial, or a simple loop. It 
is much harder to identify the larger units of a progrcim which might safely be 
run concurrently. 

3. 3. 5. 2. Performance measure and efficiency : 

One measure of the quality of a concurrent system is its efficiency in using 
the processors. If a single processor solution takes n seconds, it is desirable lo 
achieve a solution with m processors in n/m seconds. Complete efficiency is never 



attainable, but the user tries to get near to it. Efficiency must fall off as more 
processors are added. 

Occasionally it does happen that an n— processor mechanism solves a problem 
in less than m/n seconds, which require’ m seconds on a single processor. This 
appears to show more than 100% efficiency. There are three possible explanations. 
One is that the speed is data dependent: if several sets of data are tried, all of 
which require m seconds on one processor, then some may need much less than 
m/n seconds on one processors, but others will need more t,h a. n m/n seconds, so 
that the average performance is enhanced by a factor less than n. 
Characteristically, these problems have extremely variable solutions even on a single 
processor; i.e., the time needed to process one set of data is not obviously related 
to the time needed by other sets of data of apparently similar difficulty. 

Another possible explanation is that, in recasting an algorithm for concurrent 
running, a better method than the original program may have been developed. For 
example, the sequential program might be doing more page swapping than the 
concurrent algorithm which result in the superlinear efficiency. 

It seems, then, that it sometimes possible for practical Transputer users to 
realize gains in performance leirger than the number of processors used, but there 
is always some excuse for denying that this shows more than 100% efficiency. This 
suggests that a more sophisticated measure of efficiency is needed. In reality, the 
user should never expect to benefit from the super efficiency, for that occurs only 
rarely. 

3. 3. 5. 3. Balancing communication and proces sing : 

Some times, the concurrent algorithm may show very poor efficiency. The 


reasons could be 



• The computation has not been divided equally — one of the processor still 
has to do far more than 1/n th of the work. 

• The processes are independent; they have to share or communicate 
information via links, and some of them spend a lot of time waiting for others, 
during which time they cannot continue with the computation. 

• Even if time is not wasted in waiting for messages, there may lot of time 
spent in passing the messages. 

These three possibilities are quite distinct. The first one has to be solved 

by Hoad balancing — attempting to equalize the computation load on each 

processor. The second can be described as Spurious concurrency'. It can occur by 

accident, and it is not easy to predict, analyze or detect. Even when it is known 

that it is happening, it is not easily cured. The third reason for inefficiency is a 
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difficult optimization problem between computation and communication times. 

3. 3. 5. 4. Software development : 

Transputer software is mostly developed under the 'Transpuier development 
environment^ (TDS) supplied by INMOS. 

TDS and related systems provide em integrated environment for editing, 
compiling and running the programs. They are centered on the 'folding editor^ 
which embodies an elegant and general way of representing and handling large 
amount of Occam text within a small screen. They lack some of the support tools 
that are common in other systems (e.g.,in UNIX, diff, grep, multitasking, batch 
files, alicises, email, .log in files.) 

The TDS is so much an 'integrated environment that its files are not easily 
handled in other systems, not even in systems such as MS-DOS which is acting 




as the host or file server for TDS. This makes it hard for the utility software on 
the host system to provide the facilities that are missing from the TDS. 

Programmers would very much like to have further help in the pecuhar 
difficulties of concurrent program development. The strongest demand at the 
moment, and the one which seems nearest to fulfillment, is for software tools' for 
profiling, monitoring, and debugging. 

3.4 SETUP OF A TRANSPUTER NETWORK SYSTEM 


The major components of a processor network sure: 

Host computer(mostly a PC) 

Interface \mit 

processing element array(Transducers) 

Interconnection networks 
Host c ompu ter: 

The host computer is intended to provide system monitoring, data 
storage and management. It generates global control codes and object codes of 
processor elements.lt can be a microcomputer, work station, or a main frame. We 
have used PC— AT for this purpose. Processor elements can be accessed by a 
procedure call on the host, or through an interactive programmable command 
interpreter. 

Interface unit: 

Interface unit is an interface between the host and PE array. 
Interface unit, connected to the host via host or host bus, or DMA has the 
function of down loadmg,up loading, buffering array data and handling interrupts. 
It supports high bandwidth communication (accompanying high-speed processing) 
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between the array and the host. For balancing between low bandwidth of the 
system I/O and high bandwidth of the processor array, sufficient buffering is 
provided. 

Processing element array : 

A PE array consists of a number of processing elements with local 
memory. PE's effectively utihze its data storage thereby saving communication 
time. 

Interconnection network : 

Interconnection within PE array are provided by large switching 
networks which provide flexibility of interconnections and high speed communication 
between the PE's. 

3.5 HARDWARE SETUP OF THE TRANSPUTER NETWORK 
AT IP-LAB, IITK. 

A nine— node transputer network has been set up in the image processing 
lab at IIT Kanpur. The setup contains two IMS B003 evaluation boards, each 
having four transputers and an IMS B004 transputer system development(TDS) 
board having a transputer. Of these nine transputers, five are TSOO's and four are 
T414's. The host is an PC-AT (80386) One of the transputers functions as the 
interface unit, known as root processor, with 4 Mbytes of on board RAM. The 
root is connected to the host via a link adapter (IMS C012). Most of the time 
root transputer will be used for executing the I/O routines required for the host 
file server. PEs are individual transputers each with 256 Kbytes of RAM on board. 
Transputer link are connected by a programmable switch known as Link switch 
(IMS C004). 




Link adapter : 


It connects the host and the root transputer. It provides for full duplex 
transputer link communication by converting bi-directional serial link data into 
parallel data stream. 

Host : 

The host serves as a file server as memory management and file system are 
not supported by transputers. This facility of host is accessed by a program called 
server. The server reads a DOS fide to determine the network configuration, the 
progrzims to be loaded and the boot order. The host loads the network via the 
root transputer by sending loading information to it, which in turn boots and 
ioaids the transputers connected to it which again passes the and loading 
information forward and so on imtil the whole network has been booted and 
loaded. 

Booting of the transputer network : 

A commimication protocol exists between the host and the transputer 
network to direct the code to the desired place in each transputer. The bootstrap 
code for each transputer is sent first. After all transputers are booted, the code of 
each of the procedures allocated to processors is exported to the network preceded 
by necessary routing emd loading information. Following this, the code which calls 
the procedures is sent to each processor. 

The Linkswitch : 

Transputer network is intercoimected using Linkswitch (IMS C004).It is a 
transparent programmable linkswitch designed to provide a full crossbar svdtch 
between 32 link inputs and 32 link outputs. It uses the capabilities of VLSI to 
offer simple, easy to use and cheap interconnections for -^mputer systems. It 
introduces on the average of 1.75 bit time delay on the signal. Linkswitches can 



be cascaded to any depth without loss of signal integrity and can be used to 
construct configurable networks of arbitrary size. The switch is programmed via a 
separate link called the configuration link. In the setup, LINKO of the root 
transputer is connected to the host (via link adapter) and LINKl is used as the 
configuration link for the link switch. So, only LINK2 and LINK3 on the root 
transputer are free. Also, since the link switch supports only 32 Link connections, 
two links of the last (9th) transputer cannot be used at present (only LINKO and 
LINKl can be used on the last transputer). 

3.6 CODE DEVELOPMENT 


Occam is used for the program development under the TDS environment. 
The same program can be developed on a single node or on a multi node 
transputer network. On single node program would ‘ be developed as a parallel 
program of n processes. It would be using soft channels for communication between 
the parallel processes. On multi node network some changes have to be made: 

1. The individual processes axe to be define as procedures whose parameters 
should be only the hard channels and should be compiled separately. 

2. A mapping of the Occam channels to the transputer hard links should be 
given. This gives the network corrhguration information. 

3. The separately compiled procedures axe 'PLACED' on individual 
transputers emd the program is then compiled to generate a network code file. 

For running the program, network Imkswitch should be configured. This can 
be done by a software progr am which will transmit appropriate code oo the 
configuration input link of the linkswitch. 




The single/multi node program can be tested from the TDS environment. 
After successful run, it can be made a stand alone program bootable by the 
external host by using the alien file server library routines available in the TDS 


environment. 








57 



rig. .3.2 INMOS T800 Transputer 












CHAPTER 4 


IMPLEMENTATION OF MULTIRESOLTJTTON 
ALGORITHMS ON A TRANSPUTER NETWORK 


4.1 MTRODIJCTION 


The decomposition algorithm which has been developed for computing the 
muitiresolution Wavelet Transform of images has already been discussed in Chapter 
2. Similarly the reconstruction algorithm for getting the image back from the 
detail and coarse signals has also been studied in the same Chapter 2. In any 
image processing application, the time taken by the procedure is of crucial 
importance , especially so in real time applications. The concept of parallel 
processing comes in handy here. Our equipment consists of a root processor and 8 
other processors. The root processor and 4 other processors are of the T800 type 
while the remaining 4 axe of the T414 type. In this chapter, we shall discuss as 
to how these algorithms have been implemented using these Transputers. 

4.2 THE ALGORITHMS 

It would be useful to briefly recapitulate the decomposition and 

reconstruction algorithms. They are as follows: 

A two-dimensional wavelet transform can be computed with a separable 
extension of the one iimensional decomposition algorithm. At eacn step we 
decompoB* into A“jf , D^jf ,D|f ,and D^f. This algorithm ia iUuatrated by 
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the block diagram in Figure 2.1. We first convolve the rows of A|*if with a 

one-dimensional filter, retain every other row, convolve the columns of the 

resulting signals with another one-dimensional filter and retain every other colunm. 

The filters used in this decomposition axe the quadrature mirror filters H and G. 

The structure of application of the filters is given in the Fig. 2.1. We 

compute the wavelet transform of an image A^f by repeating this process for -1 > 

j ^ This corresponds to a separable conjugate mirror filter decomposition. The 

wavelet coefficients have a high amplitude around the images edges and in the 

textured areas within a given spatial orientation. 

The one-dimensional reconstruction algorithm can also be extended to two 

dimensions. At each step, the image A^j+if is reconstructed from A^jf .D^jf 
2 3 • 

,D 2 jf,and D 2 jf. This algorithm is illustrated in Figure. 2.2. Between each column of 
d 1 2 3 

the images A^jf .D^jf (D^jfiand D^jf, we add a column of zeros, convolve the 
rows with one— dimensional filter, add a row of zeros between each row of the 
resulting image, and convolve the columns with another one— dimensional filter. The 
filters used in reconstruction me the quadrature mirror filters H and G. The 
image A^f is reconstructed from its wavelet transform by repeating this process for 
j < — 1- If we use floating-point precision for the discrete signals in the 
wavelet representation, the reconstruction is of excellent quality. Figure 2.2 shows 
the reconstruction of the original image from its wavelet representation. 

4.3 TRANSPUTER NETWORK CONFIGURATION 

Different cciifigurations are possible when we use eight Transputers. The 
three possible configurations me shown in Fig. 4.1. However, in order to 
implement the above decomposition and reconstruction algorithms, we have used 
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the configuration depicted in Fig. 4.1(c). The configuration being considered is a 
linear array of the processors with data flow along a linear path. We have utilized 
the concept of data— partitioning or in parallel processing parlance, '^coarse grain 
concurrency". 

4.4 IMPLEMENTATION PROCEDURE 


4.4.1 Image ; 

The image that we have taken up for consideration is the standard Baboon 
imaige. This peirticular image has been selected because the contrast between the 
various parts of the image is considerable. The size of the image that has been 
taken up for decomposition is 128x128 or smaller. This was so, because the system 
could not support the processing of larger images. In the description of the 
algorithm implementation below, MxM represents the size of the image. In other 
words, we have taken the value of M to be 128, 64, 32, 16 etc. 

4.4.2 Choice of Basis functions : 

The decomposition and reconstruction of the above image have been studied 
for three special basis functions, namely the Haatr basis, the Linear Spline basis 
and the Cubic spline basis. The quality of reconstruction for all the three bases 
are compared to see as to which one of the three would give the best 
decomposition and the subsequent reconstruction. The mathematical formulation of 
the various basis are as follows [for calculation of the filter coefficients, see 
Appendix] ; 
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4.4.2. 1 Haar Bas i s 

For the Haar basis function, h(n)'s has been calculated and tabulated as 
follows : 


n 

h(n) 

0 

0.5 

1 

0.5 


4.4.2. 2 Linear Spline 

For the Linear Spline basis function, h(n)’s has been calculated and 
tabulated as follows ; 


n 

h(n) 

0 

0.25 

1 

0.416 

2 

0.25 

3 

0.041 

-1 ! 

0.041 





m 

4.4.2. 3 Cubic Spline 

For the Cubic Spline basis function, h(n)'s has been calculated and 
tabulated as follows ; 


n 

h(n) 

n 

h(n) 

0 

0.542 

6 

0.012 

1 

0 . 307 

7 

-0.013 

2 

-0.035 

8 

0.006 

3 

-0.078 

9 

0.006 

4 

0.023 

10 

-0.003 

5 

-0.030 

11 

-0.002 


The transfer function of the filters H, G, H and G corresponding to all the 
above three basis functions have been calculated on the basis of formulas given 
earlier. 

4.4.3 Addition Of Noise in the Original Image 

Some noise has been added to the original image. Then the decomposition 
and reconstruction of this corrupted ima^e has been done in order to investigate 
whether such a procedure using these basis results in an improvement of the 


signal— to— noise ratio iSNR) or not. The various types of noise which have been 


included are 

fa) Gaussian Noise with various means and variances. 
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and outputs (9-N) rows to the processor (N-1). Hence at the root 
processor, we once again get back 8 rows each containing M columns. 
Step 7. The above procedure (from Step 3) is repeated for each of the M/8 parts. 
Step 8. All the resultant parts are combined to get a data array of size 
(MxM/2). 

Step 9. The da;a array is transposed in the root processor so that the rows 
become columns and vice-versa. 

Step 10. The array is partitioned into M/16 parts, each having 8 rows. 

Step 11. The above procedure (from Step 3) is repeated again. 

Therefore, finally we get a data array of size (M/2 x M/2). Depending 
upon the filters used in the two steps, the final output is either the coarse signal 

or any of the three detail signals. The coarse signal A^f can then be further 

decomposed by using a similar procedure. We thus get the wavelet representation 
of the original image upto whatever resolution we please. 

4,4.4 Implementation of Reconstruction Algorithm 

Step 1. The four signals viz., A^j+i , D^jf , D^jf , D^jf (say, each of size 

M/2xM/2) are taken into the root processor. 

Step 2. Alternate columns of zero are mtroduced in each of the data arrays so 
that now the size of each array becomes M/2 x M. 

Step 3. All the four arrays are partitioned into M/8 parts of 4 rows each. 

Step 4. The firs; part is taken and the first 4 rows of each array is inputted to, 
the processor No. 1. The processor No. 1 accepts ail the rows but 

keeps just two rows : one of A^jf and the other of D^jf It lets the 
other rows pass through to the next processor. In a similar manner,: 
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the first four processors keep 2 rows, one concerning A^jf tmd the 

other concerning D^jf. The next four processors keep one row each of 
2 3 

D 2 jf and D 2 jf respectively. 

Step 5. The rows of A 2 jf, D 2 jf, D^jf, and D^jf are convolved with the filters G, 
H, G, and H respectively in the concerned processors. 

Step 6. The result of these convolutions are added in the concerned processors 

itself as shown in the Figure 2.2. So, at this staige we have one row 
each of M columns in each of the processors. 

Step 7. The rows are fed back to the root processor starting fi'om the end 

processor and passing through each of the in-between processors. 

Step 8. The above procedure is repeated (from Step 4) for each of the M/8 parts 

so that at the end we are left with two sets, each of size M/2 x M 

at the root processor. 

Step 9. We now introduce alternate rows of zeros in the two sets so that their 

new size become M x M. 

Step 10. Both the sets are again partitioned into M/8 parts of b rows each. 

Step 11. The first 8 rows of both the sets axe transmitted to the transputer 

array. Each transputer keeps one row of both the sets and lets the 
other rows go to the next processor. 

Step 12. In each of the processor, the row of the first set is convolved with the 

filter G while the row of the second set is convolved with the filter 

H. The resultant of the two convolutions in each processor are added. 

Step 13. The rows are fed back to the root processor starting from the 
end-proces.'--'r. They are combined together in the root processor. 

Step 14. This process is repeated for each part (from Step 4) so that finafiy we 
have one data array of size M x M. 

Step 15. Each data element is multiplied by 4 in order to get the signal A 2 j+ifi 
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Fig. 4.1(b). 



F o K? W' R D L I Ni K S 

SAC*4vv8Rj5 LimKS 

Fig- 4.1(c). 


4. Vaxious Configurations 


of a eight node transputer network 
























Fig. 4.1(a).' 


Host i 

?C - ' 


I KOOT 

IPnocr-ssoR. 



Fo(2w/m?.S Limks 


5p,ti^wftR.J) LiwKS 
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Fig. 4. Various Configurations of a eight node transputer network 


























CHAPTER 5 


RESULTS AND CONCLUSION S 

5.1 RESULTS 


The decomposition as well as reconstruction programs were run both on a 
single node transputer as well as the eight node transputer network. It was found 
that there was a dramatic decrease in the time taken for program execution when 
the eight node transputer network was used. On the average, the time taken by 

the eight node transputer network was (1/6)^^ of the time taken by the single 
transputer to run the same program. 

For a (64x64) image, the time taken by the eight node transputer network 
for decomposition was 1.021 seconds on the averege, with min or variations in case 
of different basis functions. The same image took 7.27 seconds when decomposed 
on a single node network. Similarly, an image of (128x128) size icok 24.75 seconds 
and 141.36 seconds on the eight node treinsputer network and the single node 
transputer respectively. 

For reconstructing an image of size (64x64), the eight node transputer 
network took 2.85 seconds on the average, while the single node transputer took 
17.36 seconds on the average. 

The decomposition and reconstruction of the images has been carried out 
with different basis functions viz., the Haar basis, linear spline and the cubic 
spline. The quality of r'sconstructed images was seen on the screen. However, to 
give a quantitative measure to the quality, it was measured in terms of the 
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normaiized mean square error (nmse) which is defined as follows:- 
nmse = [ ((Var(o-r) / (Var(o)) k 100 ] % 

where Var(o— r) represents the variance of the original image minus the 
reconstructed image. 

'2 

Vaxli) for any image i is defined as E((x-m) ) where E is the expected 
vsdue. m is the mean of the gray value of ail the pixels and x is the gray value 
of any pixel. 

From both the counts, i.e., through observation as well as through the 
criteria of normalized mean square error, it was found that the reconstruction is 
best in case of cubic spline and worst in case of Haar basis. The hnear spline 
quality comes somewhere in between. The normalized mean square error (nmse) for 
the vEirious cases are given below : 

Case 1 . Between the original image and the reconstructed image using Haar 

basis function, nmse = 14.12 % 

Case 2 . Between the original image and the reconstructed image using linear 

spline basis function, nmse = 10.78 % 

Case 3 . Between the original image and the reconstructed image using cubic 

spline basis function, nmse = 8.82 % 

Case 4. Between the original image corrupted by Gaussian noise of mean 0.0 

and standard deviation 2.0 and the reconstructed image using Haar basis function, 
nmse = 15.65 % 



Between the original image corrupted by Gaussian noise of mean 0.0 
and and standard deviation 2.0 and the reconstructed image using linear spline 

basis function, nmse = 12.47 % 

Case 6. Between the original image corrupted by Gaussian noise of mean 0.0 
and and standard deviation 2.0 and the reconstructed image using cubic spline 

basis function, nmse = 10.63 % 

Case 7 . Between the original image corrupted by Gaussian noise of mean 0.0 
and standard deviation 1.0 and the reconstructed image using Haar basis function, 
nmse = 14.96 % 

Case 8. Between the original image corrupted by Gaussian noise of mean 0.0 
and and standard deviation 1.0 and the reconstructed image using linear spline 

basis function, nmse = 11.68 % 

Case 9. Between the original image corrupted by Gaussian noise of mean 0.0 
and and standard deviation 1.0 and the reconstructed image using cubic spline 

basis function, nmse = 9.75 % 

Case 10. Between the original image corrupted by Impulsive noise of 
magnitude 125.0 and the reconstructed image using Haar basis function, nmse = 
15.89 % 

Case 11. Between the original image corrupted by Impulsive noise of 
magnitude 125.0 and the reconstructed image using linear spline basis function, 
arose = 13.38 % 



Between the original image corrupted by Impulsive noise of 
magnitude 125.0 and the reconstructed image using cubic spline basis function, 
nmse = 11.98 % 

5.2 CONCLUSIONS 


This work has described a mathematical model for the computation and 

interpretation of the concept of a multiresolution representation. It has also 

described a way of implementing this mathematical concept with the help of 

transputer networks. We explained how to extract the difference of information 
between successive resolutions and thus define a wavelet representation. Tins 
representation is computed by decomposing the original signal using a wavelet 

orthonormai basis, and can be interpreted as - a decomposition using a set of 
independent frequency channels having a spatial orientation tuning. A wavelet 
representation lies between the spatial and Fourier domains. There is no redundant 
information because the wavelet functions are orthogonal. The computation is 
efficient due to the existence of a pyramidal algorithms beised on convolutions with 
quadrature mirror filters. The original signal can be reconstructeu fiom the wavelet 
decomposition with a similar algorithm. As far as wavelet decompcsition and 
reconstruction of degraded image is concerned, there is no evidence that there is 
any noise reduction in the process for the types of noise and basis functions 
considered. This is, of course, no conclusive negation of the fact that there may 
be a noise reduction when multiresolution processing is done. It calls for further 
investigation. 



5.3 SUGGESTIONS FOR FURTHER WORK 


In this work, the wavelet decomposition and reconstruction has been 
im|iemented on a linear array of transputers. Hence, it can be investigated 
whdher any other configuration of transputers gives enhanced performance or not. 
in real time applications, it is absolutely essential to reduce the time taken in 
processmg the image. Several other configurations of the transputers are possible 
which might reduce the time taken and it would be worthwhile looking into them. 

Further, the choice of basis functions in this work has been arbitrary. It 
would be worthwhile to use other basis functions -for fo rmin g the wavelet 
representation. Moreover, the best basis functions can also be found out depending 
on the quality of the reconstructed image. 

In this thesis, we investigated the effect of the three t 3 ^es of basis 
fuiKtions on images degraded by the Gaussian and impulsive noise. Our objective 
was to find out whether there is any noise reduction in the reconstructed image. 
However, for the types of noise and the types of basis functions used, there was 
no evidence of noise reduction. It would be interesting to find out whether any 
other type of noise or any other type of basis functions can lead to noise 
reduction. 




Plate 1 . The Original Baboon image 



Plate 2 . 



the first detail image using Haar basis function. 
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Plate 3 . second detail image using Haar basis function. 



4 . D^f, the third detail image using Haar basis function. 


Plate 




Plate 6 . The Reconstx'acted image using Haar basis fimction. 


d 

^2if’ ‘I** image using Haai basis fimction. 




Plate 7 . 


The Recomtructed image using Linear Spline basis fimction. 




Plate 8 . The Reconstructed image using Cubic Spline basis function. 



- ^ ^ ^ ^ Original image degraded by Gaussian 




Plate 11. Reconstruction of the inaage corrupted by Gaussian noise using 
Line^ Spline beisis function. 



Plate 12. E^ccnstruction of the image corrupted by Gaussian noise using 



Plate 14. Eecojistruction of the image corrupted by Impulsive noise using 


APPENDIX 


design of wavelets 

The wavelet functions yCx) have been constructed fron 
orthogonal baaia functions. Amongst the known orthogonal basis 
functions are Linear Spline, Haar basis, Q'^^dratic Spline, and 
Cubic Spline. 


Ue start with a function 4> such that are an orthonoraal 

on 


basis for . Since ^ c span 


|<^(2 . -n)| 


there exists C such 
n 


that 


^Cx) 




4f (2x-n) 


Define then 


Then 


yCx) = I C-1)" <^(2x+n) 

•Iv , m.n € Z >■ constitute an orthonoraal basis of 
\ an / 


wavelets for L (R). 

Now we take up orthogonal basis functions one by one. 
Haar Basis 

The orthogor, 1 function is given by 


1 0 X < 1 


^(x) = 


0 


Otherwise 



^(x) is also expressed as 


^(x) = <^C2x) 4 ^(2x-l) 


Since ^(x) = Cq^(2x) + C^^(2x-1) 


Therefore 


C =1 C =1 
0 ’ ^ 


By equation (3.2), 


y/Cx) = <>^(2x) - ^(2x-l) 


Henc e 


VCx) = -1 

0 


0 X i 1 
1/2 < X < 1 
Otherwise. 


2 Linear Spline 

The function is given as 

X 0 5' X < 1 

= 2-x 1 < X < 2 

0 Otherwise. 


fpix) is also e.:pressed as 


#Cx) = ^ ^(2x) + .f!.(2x-l) + j 4!.(2x-2) 


Hence Cq = ^ 


C = 1 , Cj = ^ 
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V'(x) = - ^ ^C2x+1) + ^(2x) - j ^C2x-1) 


Hence 


V(x) = 


-(X . 

(3x - i) 

Ci-x, 

C|-x, 


< X i 0 


0 ^ X ^ ^ 


— < X 
2 “ 

1 i X 


DESIGN OF FILTERS 


h(n) ia the iopulae reaponse of diacrete filter H. It 
expreaaed aa 


h(n) 




^ (x/2).^ (x-n) dx. 


ia 
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